Graph kernels for chemical informatics

نویسندگان

  • Liva Ralaivola
  • Sanjay Joshua Swamidass
  • Hiroto Saigo
  • Pierre Baldi
چکیده

Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here, we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depth-first search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Kernels for Chemoinformatics

In chemoinformatics and bioinformatics, it is effective to automatically predict the properties of chemical compounds and proteins with computeraided methods, since this can substantially reduce the costs of research and development by screening out unlikely compounds and proteins from the candidates for ‘wet” experiment. Data-driven predictive modeling is one of the main research topics in che...

متن کامل

An Application of Boosting to Graph Classification

This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss...

متن کامل

A Netflow Distance between Labeled Graphs: Applications in Chemoinformatics

We propose a novel measure of similarity between labeled graphs which has applications to structured data analysis, for e.g. chemical informatics, web document clustering, etc. Exact metrics on graphs based on subgraph isomorphism have been proposed earlier but due to the lack of an efficient algorithm, they cannot be applied on large sized data. Our metric on graphs exploits vertex context sim...

متن کامل

On the Zagreb and Eccentricity Coindices of Graph Products

The second Zagreb coindex is a well-known graph invariant defined as the total degree product of all non-adjacent vertex pairs in a graph. The second Zagreb eccentricity coindex is defined analogously to the second Zagreb coindex by replacing the vertex degrees with the vertex eccentricities. In this paper, we present exact expressions or sharp lower bounds for the second Zagreb eccentricity co...

متن کامل

An Efficient Sampling Scheme For Comparison of Large Graphs

As new graph structured data is being generated, graph comparison has become an important and challenging problem in application areas such as molecular biology, telecommunications, chemoinformatics, and social networks. Graph kernels have recently been proposed as a theoretically sound approach to this problem, and have been shown to achieve high accuracies on benchmark datasets. Different gra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Neural networks : the official journal of the International Neural Network Society

دوره 18 8  شماره 

صفحات  -

تاریخ انتشار 2005